Skip to content

fix: OLMES matching effort (MC Task Suite)#182

Open
fsschneider wants to merge 17 commits intomainfrom
OLMES_matching
Open

fix: OLMES matching effort (MC Task Suite)#182
fsschneider wants to merge 17 commits intomainfrom
OLMES_matching

Conversation

@fsschneider
Copy link
Contributor

@fsschneider fsschneider commented Feb 24, 2026

PR Checklist

  • Use descriptive commit messages.
  • Provide tests for your changes.
  • Update any related documentation and include any relevant screenshots.
  • Check if changes need to be made to docs (README or any guides in /docs/).

What type of PR is this? (check all applicable)

  • Refactor
  • Feature
  • Bug Fix
  • Optimization
  • Documentation Update

Description

The changes are part of an "OLMES matching" effort. The changes include:

  • csqa.py: CommonsenseQAMC_OLMES: Sets FEWSHOT_SPLIT = "train" (was inheriting "validation" from parent)
  • medqa.py: MedQAMC_OLMES: Sets FEWSHOT_SPLIT = "train" (was inheriting "dev" from parent)
  • sciq.py: SCIQ_OLMES: Sets FEWSHOT_SPLIT = "train" (was inheriting "test" from parent)
  • piqa.py: PIQA_OLMES: Sets FEWSHOT_SPLIT = "train" and changes instruction text from "Question: {goal}\n..." to "Goal: {goal}\n...", which is what OLMES is doing.
  • drop.py:
    • New class DropCompletion_OLMES: Subclass of DropCompletion with FEWSHOT_SPLIT = "train" and max_tokens = 100 (vs. 50 in base)
    • DropMC: Added _get_cue_text returning "Answer:" (previously inherited empty string "" from BaseTask)
  • If OLMES evaluates on all splits, we select the largest of those for our _OLMES variant. This affects csqa, piqa, sciq, and social_iqa.
  • social_iqa.py: Added missing "Answer:" prefix to fewshot samples, via _get_fewshot_target_text
  • task_names.py: Registers DropCompletion_OLMES
  • task-prompts-hashes.json: Added hash for DropCompletion_OLMES and updated hash of tasks whos fewshot split changed.
  • Update documentation.

Added/updated tests?

  • Yes
  • No, and this is why: please replace this line with details on why tests
    have not been included
  • I need help with writing tests

@fsschneider fsschneider self-assigned this Feb 24, 2026
@fsschneider fsschneider changed the title WIP fix: OLMES matching effort fix: OLMES matching effort Feb 27, 2026
@fsschneider fsschneider changed the title fix: OLMES matching effort fix: OLMES matching effort (MC Task Suite) Feb 27, 2026
@fsschneider fsschneider marked this pull request as ready for review February 27, 2026 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant